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1. INTRODUCTION 

We congratulate the authors on a review of con- 
vergence rates for Gibbs samphng routines. Their 
combined work on studying convergence rates via 
orthogonal polynomials in the present paper under 
discussion (which we will denote as DKSC from here 
onward), via coupling in Diaconis, Khare and Saloff- 
Coste (2006), and for multivariate samplers in 
Khare and Zhou (2008), enhances the toolbox of the- 
oretical convergence analysis. This has the potential 
of opening new avenues of pursuit for gauging chain 
convergence in practice, and optimally implement- 
ing Gibbs sampler strategies. In this discussion, we 
focus on the latter, within the context of the random 
scan Gibbs sampler presented in DKSC. Although 
the analysis in DKSC does not seem to extend to the 
random scan implementation we consider, a study of 
convergence rate and estimator precision is possible, 
in theory, for special cases as well as in general prac- 
tice. Our aim is to motivate further research within 
the context of DKSC to identify objective criteria 
for optimizing implementation of the random scan 
Gibbs sampler. 

2. REVISITING RANDOM SCAN GIBBS 
SAMPLERS 

The random scan Gibbs sampler considered in 
DKSC has an equal likelihood of visiting each co- 
ordinate, {x,0), during an iteration of the sampler. 
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As put forth by the seminal convergence theory work 
of Liu, Wong and Kong (1995) and discussed more 
recently by Levine and Casella (2006), an optimal 
implementation of the random scan strategy may 
visit less often components with a marginal that is 
easier to understand or describe. For example, in the 
bivariate cases of DKSC, each iteration of the ran- 
dom scan visits x with probability ai and 6 with 
probability 1 — ai, where ai € (0, 1), not necessar- 
ily equal to 0.5. For the general multivariate prob- 
lem of sampling a d- vector X, the random sweep 
strategy is characterized by selection probabilities 
a = (ai , 02 5 • • • 5 Od) 5 where Ylf=i Oii = l, oti not nec- 
essarily equal to l/d for all i. 

In the notation of DKSC, the transition kernel 
of the random scan Gibbs sampler for a function 
gGL^iP) is 



(1) 



Kg{x,9)=ai / g{x,e')7r{e'\x)7r{de') 

+ (l-ai) / g{x',e)fe{x')^l{dx'). 
Jx 



Unfortunately, ^ in (1) is not readily diagonalizable 
as the decomposition in the proof of DKSC Theo- 
rem 3.1, part (c), relies on the equal selection prob- 
abilities (ai = 0.5) to partition the transition kernel 
acting on appropriate functions g. However, in the 
cases of discrete state spaces and Gaussian target 
distributions, both considered in the exposition of 
DKSC, we may identify explicit convergence rates 
and optimally choose selection probabilities. In the 
following sections, we elaborate on these findings 
and present an alternative approach with estima- 
tor precision as an objective criterion. We also sug- 
gest avenues for future research within the context 
of DKSC to address the random scan Gibbs sampler 
decision problem. 

3. CONVERGENCE RATES 

Convergence rates of Gibbs sampling routines may 
be formulated in two special cases: Gaussian and dis- 
crete target distributions. DKSC Section 6.3 eludes 
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to the case of Gaussian distributions, identifying the 
work of Goodman and Sokal (1989), that shows con- 
vergence rates as the largest eigenvalue of a matrix 
related to the dispersion matrix and an autoregres- 
sive transition of the Markov chain (see 
Khare and Zhou (2008), as weh). Amit (1996) and 
Roberts and Sahu (2001) provide an alternative ex- 
pression which lends well to our analysis of random 
scan Gibbs samplers. In particular, Levine et al. 
(2005) shows that for a d-dimensional Gaussian tar- 
get distribution with d-vector zero mean and dis- 
persion matrix S, iVrf(0,S), the random scan Gibbs 
sampler has convergence rate p{I — ^SR) where 
p{-) is the spectral radius (maximum modulus eigen- 
value), * = diag(ai, . . . , a^), R = and S = 
diag(l/rii, . . . , l/r^rf) with ru the (i,i)th or ith di- 
agonal element of R. Note that p{I — ^SR) is a 
function of the selection probabilities and thus may 
be used as an objective criterion for optimal choice 
of a. 

Consider a different form of the Gaussian example 
of DKSC Section 4.3, along the lines of Section 6.3, a 
bivariate Gaussian target distribution with bivariate 
mean of zero, standard deviations ai and (72, and 
correlation p. The convergence rate is 

A,, = 0.5{1 + ^l + 4al{l-p^)-4ai{l-p^)}. 

Interestingly, the covariance structure, with covari- 
ance r = pcria2, leaves the convergence rate as a 
function of the correlation p and not the variance 
components. The random scan with equal selection 
probabilities, qi = 0.5, has the smallest convergence 
rate, over the range of standard deviations and cor- 
relation. 

In the case of multivariate Gaussian distributions, 
the random scan with equal selection probabilities is 
not necessarily optimal with respect to convergence 
rate. For example, consider a trivariate Gaussian 
distribution with zero mean vector and dispersion 
matrix 

S = diag(cjf , 0-^, cj|) - l/((i + 0.005)J 

where J is a matrix of ones, an exchangeable cor- 
relation structure considered by Roberts and Sahu 
(2001) and Levine et al. (2005). In the case <ti = 10 
and (T2 = o"3 = 1, the random scan Gibbs sampler 
with a = (0.22,0.39,0.39) has the smallest conver- 
gence rate. Nonetheless, the gain in rate over the 
random scan with equal selection probabilities is less 
than 10%. Levine et al. (2005) provide further il- 
lustrations of optimal random scan Gibbs samplers 



for multivariate Gaussian target distributions where 
non-equal selection probabilities minimize the con- 
vergence rate. However, often the computational cost 
in identifying the selection probabilities that mini- 
mize the convergence rate is not sufficiently offset 
by the gain in convergence speed. 

In the case of discrete state spaces, Frigessi et al. 
(1993) shows that for a transition matrix Prs, the 
convergence rate is the second largest eigenvalue in 
modulus, p2{Prs) = max{|A| : A an eigenvalue of P^s, 
A 7^1}, the largest eigenvalue being equal to one. 
Note that p2{Prs) is a function of the selection prob- 
abilities and thus may be used as an objective cri- 
terion for optimal choice of a. 

Consider the binomial example of DKSC Section 5.1 
where at iteration t, 0\Xt-i ~ hypergeometric(ni, 
n2, Xt-i), X = Ot-i + e with e ~ binomial(n2,p) and 
marginally X binomial(ni -|- n2,p), ~ 
binomial(7T,i , p) . Of course the cardinality of the state 
space is a function of ni and n2 so a closed form 
expression of the convergence rate as a function of 
ai is not available. However, for given ni and n2, 
we may easily minimize p2{Prs) with respect to the 
selection probabilities. Empirical evidence suggests 
that the random scan with equal selection proba- 
bilities, ai = 0.5, leads to the smallest convergence 
rate. 

Levine (2005) provides illustrations of optimal ran- 
dom scan Gibbs sampler for multivariate discrete 
target distributions. As with multivariate Gaussian 
target distributions, the random scan with equal se- 
lection probabilities is non-optimal with respect to 
convergence rate, however the loss in convergence 
speed is minimal. The random scan Gibbs sampler 
analyses in DKSC for bivariate chains, and that of 
Khare and Zhou (2008) for multivariate target dis- 
tributions, may thus be a worthwhile pursuit, focus- 
ing exclusively on uniform visitation of coordinates. 
We will discuss this matter more below. 

4. ASYMPTOTIC VARIANCE 

We have seen that if the optimality criterion is 
convergence rate, it is often the case that there is 
only minimal gain in using the optimal random scan 
rather than the equal probability scan. The story 
is not the same, however, if we shift attention to 
estimator precision as the objective criterion. 

An alternative means of choosing random scan 
selection probabilities is through a study of esti- 
mator precision. Suppose interest lies in estimat- 
ing E^{h(X.)} for a function h G L^(7r), where tt is 
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the distribution of the d-vector X. The natural es- 
timator of this expected value is the sample mean, 
(1/m) X^ilLi ^(Xj) of m variates generated by the 
random scan Gibbs sampler. We may thus identify 
the best scan strategy through a minimization of the 
asymptotic variance 

(2) R{cx, h) = Jirn^ m VAr| 1 £ MX.) I • 

Levine and Casella (2006) show that the two-lag au- 
tocovariance in the asymptotic variance expansion 
may be presented as the square of the convergence 
rate, relating these two objective criteria. (See 
Chen, Liu and Wang (2002), for more details on this 
relationship.) Levine and Casella (2006) also show 
that R{oL, h) is a polynomial in a. For sake of space, 
we will not duplicate the expressions here. However, 
optimization of the asymptotic variance over the se- 
lection probabilities is feasible, particularly in the 
case of Gaussian and discrete target distributions. 

Consider again the bivariate Gaussian example 
(DKSC Sections 4.3 and 6.3). A second-order ap- 
proximation of the asymptotic variance (2) for lin- 
ear functions /i(X) identifies optimal random scans 
with non-equal selection probabilities, following the 
intuition presented earlier of visiting more often the 
most variable coordinate. For example, in the case 
of estimating the sum of the coordinates, the asymp- 
totic variance is 

R{oL, h)=a'l + a2 + 1pa\a2 + a\ {pai + 

+ (1 - ai){(Ji + pa-if + a\{pai + 

+ {l-aif{ai+pa2f 

+ 2ai(l -ai){pai + (T2){(Ji + pa2)p. 

If the standard deviations are ai = 2 and (J2 = 1 
with correlation p = 0.5, the scan that minimizes 
the asymptotic variance has ai = 0.93. 

In the case of discrete state spaces, Peskun (1973) 
shows that the asymptotic variance is R(ct,h) = 
/i(2BZ - B - BA)h^ where A is a matrix with 
each row containing the vector of stationary dis- 
tribution probabilities vr, B is a diagonal matrix 
with vr on the diagonal, h is a vector of the func- 
tion h applied to each element of the state space, 
and Z = {I — (Prs — A)}~^ the fundamental matrix 
with identity matrix I. Consider again the binomial 
example of DKSC Section 5.1. As in the Gaussian 
case, minimization of this asymptotic risk over ai 
identifies optimal random scans that visit the most 



variable coordinate at a higher frequency. For ex- 
ample, if parameters are set at ni = 6, 712 = 3, and 
p = 0.5, the scan that minimizes the asymptotic vari- 
ance has a = 0.56. 

Levine et al. (2005) and Levine and Casella (2006) 
show that these optimal random scans present sig- 
nificant improvement over a random scan with equal 
selection probabilities, not only in asymptotic vari- 
ance but also in chain mixing. 

5. IMPLEMENTATIONS 

For general applications of the random scan Gibbs 
sampler to multivariate target distributions, neither 
the convergence rate nor the asymptotic variance 
may necessarily be available in closed form. How- 
ever two implementations have been proposed to 
choose optimal selection probabilities in practice. 
Levine et al. (2005) suggests using a Gaussian ap- 
proximation to the target distribution to determine 
the optimal random scan, perhaps in a tuning phase 
of the sampler or as an adaptive procedure. Since 
the convergence rate and asymptotic variance are 
accessible under a Gaussian target distribution, sev- 
eral adaptive and non-adaptive random scan Gibbs 
sampler algorithms present themselves. 

Levine and Casella (2006) propose an adaptive ran- 
dom scan Gibbs sampler which chooses optimal se- 
lection probabilities "on the fly," learning and adapt- 
ing the sweep strategy as the chain traverses the 
state space. The induced chain is no longer Markov 
but still converges to the desired equilibrium distri- 
bution. In the most general form, the adaptive strat- 
egy identifies a minimax random scan for the set 
of selection probabilities that minimizes the asymp- 
totic variance for the worst possible function of in- 
terest. 

The optimal random scan Gibbs samplers deter- 
mined with respect to the convergence rate and asymp- 
totic variance, though potentially identifying differ- 
ent sets of selection probabilities, are not contra- 
dictory. As suggested by Mira (2001) and discussed 
further in Levine et al. (2005), the convergence rate 
criterion is most desirable during the burn-in period 
of the Markov chain, estimator precision is of impor- 
tance for drawing inferences from the Gibbs sam- 
pler output. Therefore, our recommendation is to 
first implement a random scan with equal selection 
probabilities and then, during the post-processing 
phase of the sampler, choose selection probabilities 
that minimize the asymptotic variance. The con- 
vergence rate calculations of DKSC are of utmost 
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importance then for convergence assessment. If the 
techniques allow for matrix decompositions or ex- 
pressions for the asymptotic variance, the tools pro- 
vide for pre- and post-burn-in implementations of 
the random scan Gibbs sampler. Furthermore, such 
expressions may lend to computationally inexpen- 
sive implementations of both the Gaussian approx- 
imation and adaptive procedures for optimally se- 
lecting random scan Gibbs samplers. 
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